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Abstract 

Small-world networks, i. e. networks displaying both a high clustering coeffi¬ 
cient and a small characteristic path length, are obliquitous in nature. Since 
their identihcation, the “small-worldness” metric, as proposed by Humphries 
and Gurney, has frequently been used to detect such structural property in 
real-world complex networks, to a large extent in the study of brain dynam¬ 
ics. Here I discuss several of its drawbacks, including its lack of dehnition 
in disconnected networks and the impossibility of assessing a statistical sig- 
nihcance; and present different alternative formulations to overcome these 
difficulties, validated through the phenospaces representing a set of 48 real 
networks. 
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1. Introduction 

Small-worldness [1] is a complex network property that has received a 
huge amount of interest in the last decade. Artihcial networks were ini¬ 
tially created around two different paradigms: random graphs, in which the 
existence of a link is the result of a random process [2]; and regular ones, 
whose nodes have the same number of connections. It was nevertheless soon 
discovered that many real technological, biological and social networks fall 
in the middle: they show high local clustering {e.g. triangles), like regu¬ 
lar networks, but also short path lengths between elements, characteristic of 
random graphs. 


Preprint submitted to Physica A 


May 15, 2015 




With the aim of providing an objective measure of the small-world na¬ 
ture of real networks, Humphries and Gurney [3] proposed in 2008 a metric 
based on the celebrated Watts-Strogatz (WS) model. The small-worldness 
structural measure S was dehned as the ratio between the clustering coeffi¬ 
cient and the characteristic paths length, normalised according to the values 
expected in random equivalent networks: 
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Here, Crand and Lrand respectively represent the clustering coefficient and 
the characteristic paths length observed in random equivalent networks, i.e. 
random networks with the same number of nodes and links than the one 
under study. As a network is said to be small-world when L Lrand and 
C ^ Crand, S > 1 indicates the presence of such property. 

Since its introduction, the small-world metric as defined in Eq. has 
been applied to the study of a large number of real systems, with a spe¬ 
cial attention devoted to the human brain, both in normal and pathological 
conditions HIS]- The reasons for such interest become clear when one high¬ 
lights the fact that the small-worldness synthesises two important aspects 
of networks (and of brain) dynamics: local interconnectivity, through G, as 
the creation of group of nodes strongly and redundantly connected between 
them; and global integration, through L, representing the movement of infor¬ 
mation across large distances. Such balance between short- and long-range 
connectivities is altered in the Alzheimer’s disease, both in patients 
and in control subjects carrying genetic variations used as biomarkers [8]; 
similar results were also obtained for individuals suffering from Mild Cogni¬ 
tive Impairment [ado], the prodromal stage of Alzheimer’s. Beyond biology, 
small-worldness has been applied to the analysis of terrorists social networks 
m, of audio clip sharing communities na, up to as a criteria for organising 
datacenters na, among others. 

In spite of its popularity, such metric presents several drawbacks. First, 
it is dehned only for connected networks; when disconnected components are 
present, a common situation in real systems, L diverges to inhnity. Second, 
the normalisation with respect to random networks does not yield informa¬ 
tion about the signihcance of the obtained value. In this contribution, I 
address these two problems by presenting two alternative formulations of the 
small-world metric. They are respectively based on the concepts of efficiency 
(Section]^, which, while conveying the same information as L, is dehned even 
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in the case of disconnected networks; and of ZScore (Section]^, enabling a 
better estimation of the statistical signihcance of results. Both metrics are 
then tested against a set of 48 real networks, covering social, biological and 
technological systems [HI HH [161 [13 UHl [IS [201 IH]. Section hnally draws 
some conclusions, and recommendations on the use of the small-worldness 
for the analysis of real networks. 

2. Efficiency vs. Characteristic paths length 

The first problem in the application of the small-world metric appears 
when the network under analysis is not connected, i.e. when a path cannot 
be constructed between some of its nodes. For those pairs, the characteristic 
paths length L diverges, and thus S' —?• 0. Even if the original graph is 
composed of a single component, the random networks used for normalisation 
may be disconnected, especially when the probability of links appearance p 
is below the threshold n being the number of nodes [2]- 

It is worth noticing that this situation frequently appears in the study of 
real systems, and especially in the study of brain dynamics. For instance, the 
functional network |1] representing a given cognitive task may not connect 
all brain regions, as some of them may not be involved in the computation. 
Two solutions can then be adopted. First, consider only the giant component 
of the network, i.e. the largest group of nodes forming a connected sub¬ 
graph; resulting networks may nevertheless have different number of nodes, 
and represent different parts of the brain, making difficult any comparison 
between subjects and tasks. Second, networks can be created by applying 
dynamical thresholds, in order to ensure the connectivity of the network, at 
the price of possibly including links without biological value [22]. 

Here I propose a different approach, which stems from the use of a distance 
metric that is well defined even for disconnected networks. Such metric, 
called Efficiency [231 EU, is defined as: 

^( 5 ) ( 2 ) 

^ ' i^j&g 

The efficiency is thus the inverse of the harmonic mean of all shortest 
paths lengths dij, being i and j nodes of the graph Q, normalised in order 
to obtain 0 < E < 1. Notice that when the graph is fully disconnected, 
dij —?• oo and thus E —)■ 0. Being the efficiency inversely proportional to the 
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Figure 1: Comparison between the small-worldness S and its alternative formulations, 
through the phenospace of 48 real networks. (Left) S vs. , with the red dashed line 
representing the identity S = . (Center) S vs. S^*. (Right) S vs. Grand- In all cases, 

green circles and blue squares respectively represent connected and disconnected networks; 
for the latter, S has been calculated over their giant component. The Y axis is common 
to all three graphs, allowing to follow the same network through them, as symbolised by 
the horizontal dashed arrows. 


shortest distance between nodes, i.e. E ^ 1/L, the small-world metric can 
be redehned as: 


SE 


C E 

rand Erand 


(3) 


which has the advantage of being dehned independently of the connect¬ 
edness of the network. 

Fig. 0 Left compares S and S'® through the phenospace created by the 
48 real networks here considered, where each point in the plane represents a 
network, and its coordinates are given by the values of both metrics. In the 
case of disconnected networks (blue squares throughout Fig. [^, S has been 
calculated only on their giant connected component. It can be appreciated 
that, for connected networks (green circles). S'® very well approximates S' 
(notice the red dashed line corresponding to S = S'®). On the other hand, 
disconnected networks deviate from the identity relation, both above and 
below. In the latter case thus S' is biased by the removal of loosely connected 
nodes, which decreases both L (thus increasing S') and C (decreasing S'). S'® 
maintains the original meaning of S', while providing a better assessment of 
the structure of disconnected networks. 
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3. ZScore vs. expected values 

The second problem here addressed is the dehnition of a normalisation 
procedure that can yield information about the statistical signihcance of re¬ 
sults. According to Eq. this is usually performed by normalising C and 
L according to their expected values Crand and Lrand, as observed in an en¬ 
semble of random networks with the same number of nodes and links. While 
CjCrand and L/Lrand represent how far are the obtained values from what 
expected, they do not provide information about the statistical relevance 
of such values. Suppose two networks Qi and Q 2 , such that C{Qi) = 0.05, 
CrandiGi) = 0.025, C(^ 2 ) = 1 and CrandiG 2 ) = 0.5. In both cases, 

CC(g.) CC(fe) _, 

CCrarASl) CCr„A52) ’ ’ ’ 

An attentive eye would nevertheless observe that the clustering coefficient 
of G 2 is more unusual, as a perfect clusterisation C'(^ 2 ) = 1 is hardly expected 
in random networks. 

In order to highlight the difference between both situations, and thus to 
assess the statistical signihcance of S, I here propose the use of the ZScore, 
a standard method for calculating the p-value of a measurement given a 
Gaussian reference distribution: 


ZScore{M) 


M - (Mrand) 
Crl^Adrand) 


(5) 


M represents the metric under analysis, Mrand a set of values obtained 
in random equivalent networks, and (■) and (t(-) respectively their average 
and standard deviation. Large positive and negative values of the ZScore 
(respectively above 2 and below —2) represent statistically signihcant high 
and low observed values. The small-world metric can then be reformulated 
as follows: 


= ZScore{CC) - ZScore{L)- (6) 

or, including the efficiency in the dehnition, as: 

= ZScore{CC) + ZScore{E). (7) 

Fig. m Center depicts the relation between S and . As ZScore values 
can assume extreme values, it is here represented as its logarithm, i. e. S^* = 
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Figure 2: Small-worldness S and its weighted version 5'’^. Through the different panels 
composing this Figure, it is possible to compare pairs of metrics, including S, L, C, and 
their weighted version 5"^ and . When the X and Y axes are common to two panels, 
it is possible to follow the same network through them, as symbolised by the dashed grey 
arrows. 


sign[S^) ■ logj^g \^^\- can be noticed that similar values of S', e.g. between 
0.0 and 1.0, can have very different statistical signihcances (S^* between 0 
and —12, i.e. between 0 and —10^^, the latter corresponding to extremely 
small p-values). 

4. Conclusions and discussion 

This contribution discusses the problem of numerically evaluating the 
small-world property of a complex network, i.e. identifying structures that 
are characterised both by a high clustering coefficient and a small shortest 
path distance between nodes. The metric originally proposed by Humphries 
and Gurney [3] presents two main drawbacks, namely the inability to handle 
disconnected networks, and the little information provided about the sta¬ 
tistical signihcance of results. Here I show how these two problems can be 
solved by respectively including the efficiency of the network, as a proxy of 
the inverse of the geodesic distance between nodes; and by using a ZScore 
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instead of a simple normalisation, which allows assessing the p-valne of the 
observations. 

Beyond the problem of correctly quantifying the presence of a small-world 
structure, a researcher analysing real networks should also be careful about 
understanding the origin of that property. Independently on the specihc met¬ 
ric used, the calculation of S (as well as of and S^) implies a reduction 
of the information available, from a two-dimension space (given by C and L, 
or C and E) to an unidimensional one: information is thus always lost in 
the process. One remaining task is thus to understand from what network 
property the small-worldness arises, i. e. from a higher than expected cluster¬ 
ing coefficient, or from a smaller than expected long-range connectivity. In 
order to shed light on this issue. Fig. [T] Right depicts the relation between S 
and Crandi as observed in the 48 real networks here considered. The red line 
represents the best linear fit between them (S' = 1.07 ■ C — 0.42, B? = 0.972): 
the small-world property is thus largely explained by the clustering coeffi¬ 
cient, making the former metric mostly redundant in the understanding of 
those systems. This problem is especially relevant in small networks, as in 
the representations created from EEG and MEG recordings of brain activity 
|1], in which L is largely constrained. 

I here suggest to solve this latter problem by including, within the def¬ 
inition of the small-worldness, a weighted connectivity metric that strongly 
penalises the presence of pairs of nodes whose distance is greater than the 
one expected in a small-world network, i.e. d > Inn. This requires, first, to 
normalise the distance between pairs of nodes as follows: 




( 8 ) 


being n the number of nodes in the network. L can then be redefined as: 


L^ = 


n n 


ryEKi)’ 


( 9 ) 




w > 1 being a parameters used to penalise long paths, i.e. those longer 
than Inn. The small-worldness metric can then be updated accordingly, by 
introducing the weighted measures and inside Eq. Fig. [^depicts, 
throughout its panels, the phenospaces created by pairs of metrics, including 
both standard (G, L and S) and weighted ones and S^), for w = 3. 
Of special relevance are the two central panels, in which S (top) and 
(bottom) are compared with the normalised clustering coefficient; it can be 
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appreciated that the weighted small-worldness introduces more variability, C 
being no longer enough to explain the network structure. 

In summary, the researcher dealing with small-world complex networks, 
and willing to quantifying such topological structure, should choose the met¬ 
ric accordingly to the property of the network, and specihcally its connect¬ 
edness and its size. MATLAB™ source codes for all discussed metrics are 
available at m- 
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